Illlllllllllllllllllllllllllillllilllllllllllllll 



US 20010014860A1 

(19) United States 

(12) Patent Application Publication <io) Pub. No.: US 2001/0014860 Al 

Kivimaki (43) Pub. Date: Aug, 16, 2001 



(54) USER INTERFACE FOR TEXT TO SPEECH 
CONVERSION 

(76) Inventor: Mikfl Kivimaki, Helsinki (FI) 

Correspondence Address: 

ANTONELLI TERRY STOUT AND KRAUS 

SUITE 1800 

1300 NORTH SEVENTEENTH STREET 
ARLINGTON, VA 22209 

(21) Appl. No.: 09/739,792 

(22) Filed: Dec. 20, 2000 

(30) Foreign Application Priority Data 

Dec. 30, 1999 (GB) 9930745.6 

Publication Classification 

(51) Int. CI. 7 G10L 13/00; G10L 15/26; 

G10L 13/08 



(52) U.S. CI 704/260; 704/235 



(57) 



ABSTRACT 



An electronic device (2) is disclosed which comprises a 
speech synthesizer (6; 16) including a loudspeaker (6), 
arranged to convert an input dependent upon punctuated 
text, to an audio output representative of a human vocally 
reproducing the text. It also comprises a user input device 
(4) for inputting instructions to navigate through text, 
between positions defined by punctuation identifiers of the 
text, to a desired position, and a controller (14) arranged to 
control navigation to the desired position and provide the 
speech synthesizer with an input corresponding to a portion 
of the text from the desired position, in response to input 
navigation instructions. 
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USER INTERFACE FOR TEXT TO SPEECH 
CONVERSION 

BACKGROUND OF THE INVENTION 

[0001] The present invention relates to user interface for a 
device which provides text to speech synthesis. 

[0002] The synthesis of human speech using electronic 
devices is a well developed and published technology and 
various commercial products are available. Typically speech 
synthesis programs convert written input to spoken output 
by automatically generating synthetic speech and speech 
synthesis is therefore often referred to as "text-to-speech" 
conversion (TTS). 

[0003] There are several problems in speech synthesis 
which, as yet, have not been satisfactorily resolved. One 
problem is the difficulty in comprehension of the synthetic 
speech by a user. This problem may be exacerbated in 
mobile electronic devices such as mobile telephones or 
pagers which may have limited processing resources. 

[0004] It would be desirable to improve the level of 
comprehension a user has of the speech output from such 
speech synthesiser systems. 

SUMMARY OF THE INVENTION 

[0005] According to one aspect of the present invention, 
there is provided an electronic device comprising a speech 
synthesizer including a loudspeaker, arranged to convert an 
input dependent upon punctuated text, to an audio output 
representative of a human vocally reproducing the text; a 
user input device for inputting instructions to navigate 
through text, between positions defined by punctuation 
identifiers of the text, to a desired position; and a controller 
arranged to control navigation to the desired position and 
provide the speech synthesizer with an input corresponding 
to a portion of the text from the desired position, in response 
to input navigation instructions. 

[0006] Such a device provides the user with a means for 
navigating through text thereby selecting desired portions to 
be output audibly by the speech synthesiser. Further, since 
the navigation is between punctuation identifiers, the por- 
tions of text are split logically, enabling the user to put 
individual words into context more easily. Thus, the intel- 
ligibility of the audio output by the user is improved. 

[0007] The punctuation identifiers may be punctuation 
marks provided in the text, and/or other markers. The 
electronic device may use punctuation identifiers which 
identify the beginning of sentences, such as a full- stop 
(period), exclamation mark, question mark, capital letter, 
consecutive spaces. Alternatively, the punctuation identifiers 
may be marks such as a comma, colon, semi-colon, or dash 
which are also used to separate words in text into logical 
units. Also, the input text can include special characters for 
this purpose. The creator of the text may, for example, use 
special characters to mark words which may be difficult and 
thus need to be replayed, when he foresees intelligibility 
problems. 

[0008] The electronic device may comprise a display for 
presenting a text portion which the user can refer to confirm 
his understanding of the audio output. 



[0009] The device may be arranged to navigate backwards 
through the text, thereby providing a function for repeating 
a portion of text. The device may respond to a repeat or 
backwards command input by a user, by the controller 
navigating backwards to a position defined by a predeter- 
mined punctuation identifier so as to repeat the portion of 
text from that position. 

[0010] The predetermined punctuation identifier may be 
the first punctuation identifier in the backwards sequence or 
alternatively a second or further punctuation identifier in the 
backwards sequence. However, preferably the navigation 
depends on how quickly the repeat command is made after 
the audio output corresponding to the first punctuation 
identifier in the backwards sequence. According to such an 
embodiment, the device may determine this based on the 
length of text and/or the length of time for audible repro- 
duction of the text between the current position and the 
position defined by the first punctuation identifier in the 
backwards sequence. If the length is below a threshold (such 
as five words, for example, or two seconds), the controller 
is arranged to navigate backwards to a position defined by 
the second punctuation identifier in the backward sequence. 

[0011] The speech synthesiser may repeat the text more 
slowly than a default speed. This has the advantage of 
further improving the comprehensibility of the repeated 
synthesised speech. If the device comprises a display, the 
default speed may be that of the display of text on the 
display. Alternatively, the default speed may be the normal 
speed of the output by the speech synthesiser. 

[0012] Alternatively, or in addition to the backward navi- 
gation, the device may be arranged to navigate forwards 
through the text. In this way, it can jump forwards past a 
portion of the text. The device responds to a forward or skip 
command input by a user, by the controller navigating 
forwards to a position defined by a predetermined punctua- 
tion identifier, so as to skip the portion of text between the 
current position and that position. In other words, it jumps 
to provide an audio output from the position defined by that 
predetermined punctuation identifier. 

[0013] The predetermined punctuation identifier may be 
the first punctuation identifier in the forward sequence, or 
alternatively a second, or a further, punctuation identifier in 
the forward sequence. However, preferably the navigation 
depends on how soon the audio output corresponding to the 
next punctuation identifier would occur in the absence of the 
skip command. According to such an embodiment, the 
device may determine this based on the length of text and/or 
the length of time for audible reproduction of the text 
between the current position and the position defined by the 
first punctuation identifier in the forward sequence. If the 
length is below a threshold, the controller is arranged to 
navigate forwards to a position defined by a second punc- 
tuation identifier in the forward sequence. 

[0014] There are a number of ways in which a user can 
input his instructions. In one embodiment, the user may 
input instructions via a user input comprising a key means. 
The key means may be a user actuable device such as a key, 
a touch screen of the display, a joystick or the like. The key 
means may comprise a dedicated instruction device. If the 
device provides for forward and backward navigation, then 
it may comprise separate dedicated navigation instruction 
devices. That is, one for forward navigation, and one for 
backward navigation. 
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[0015] The control means may determine the number of 
device actuations and determine the position of the punc- 
tuation identifier associated with that number of actuations. 
For example, pressing the dedicated key associated with 
backward navigation instruction two times could cause the 
device to navigate to a position of the punctuation identifier 
two back. 

[0016] Alternatively, the position of punctuation identifier 
may be determined on the length of time the dedicated key 
is depressed. 

[0017] Alternatively, the key means may comprise a 
multi-function key. One function of this key is selecting a 
navigation instruction. The navigation instruction itself may 
be provided by the user inputting it, or via a menu option. In 
either case, the multi-function key is used to select the 
navigation instruction. 

[0018] Instead of, or in addition to the key means, the user 
input device may comprise a voice recognition device. Such 
a voice recognition device typically provides navigation 
instructions by way of a voice command. 

[0019] The electronic device may be a document reader, a 
portable communications device, a handheld communica- 
tions device, or the like. 

[0020] According to another aspect of the present inven- 
tion there is provided a portable radio communications 
device comprising a speech synthesizer including a loud- 
speaker, arranged to convert an input dependent upon punc- 
tuated text, to an audio output representative of a human 
vocally reproducing the text; a user input device for input- 
ting instructions to navigate through text, between positions 
defined by punctuation identifiers of the text, to a desired 
position; and a controller arranged to control navigation to 
the desired position and provide the speech synthesizer with 
an input corresponding to a portion of the text from the 
desired position, in response to input navigation instruc- 
tions. 

[0021] The device may further comprise means for mount- 
ing in a vehicle. 

[0022] According to a further aspect of the invention, 
there is provided a document reader comprising a speech 
synthesizer including a loudspeaker, arranged to convert an 
input dependent upon punctuated text, to an audio output 
representative of a human vocally reproducing the text; a 
user input device for inputting instructions to navigate 
through text, between positions defined by punctuation 
identifiers of the text, to a desired position; and a controller 
arranged to control navigation to the desired position and 
provide the speech synthesizer with an input corresponding 
to a portion of the text from the desired position, in response 
to input navigation instructions. 

[0023] These devices may be provided in a car. If so, and 
if the device comprises key means, these are preferably 
provided on the steering wheel of the car. 

[0024] According to yet another aspect of the present 
invention there is provided a method of navigating through 
text to a desired position for audio output by a speech 
synthesizer, the method comprising detecting instructions 
input by a user to navigate through text, between positions 
defined by punctuation identifiers of the text, to a desired 
position; controlling navigation to the desired position; and 



providing the speech synthesizer with an input correspond- 
ing to a portion of the text from the desired position. 

[0025] According to a still further aspect of the present 
invention there is provided a method for providing speech 
synthesis of a desired portion of text, the method comprising 
determining a desired start position from a selection defined 
by punctuation identifiers, from an instruction input by a 
user; moving to the desired start position; outputting speech 
synthesized text from that position. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0026] Embodiments of the present invention will now be 
described by way of example with reference to the accom- 
panying drawings, of which: 

[0027] FIG. 1 illustrates an electronic device with a user 
interface having an input device and loudspeaker; 

[0028] FIG. 2 is a schematic illustration of the compo- 
nents of the electronic device illustrated in FIG. 1; 

[0029] FIG. 3 is a mobile phone according to an embodi- 
ment of the present invention; 

[0030] FIG. 4 is a schematic illustration of the compo- 
nents of the mobile phone illustrated in FIG. 3; 

[0031] FIGS. 5a and Sb illustrate the selection of naviga- 
tion commands according to an embodiment of the present 
invention; 

[0032] FIG. 6 illustrates the navigation through text and 
the subsequent output of selective portions of the text; 

[0033] FIG. 7 illustrates various methods of inputting a 
repeat command; 

[0034] FIG. 8 illustrates a method of repeating text 
according to a preferred embodiment of the invention; and 

[0035] FIGS. 9a and 9b illustrate exemplary databases for 
controlling navigation. 

DETAILED DESCRIPTION OF THE 
INVENTION 

[0036] FIG. 1 illustrates an electronic device 2. The 
electronic device has an input device 4 and an output device 
6. The input device comprises a microphone 3 for receiving 
an audio input and a tactile input device 5. The output 6 is 
a loudspeaker 6 which is used to broadcast synthesised 
speech to a user. 

[0037] The input device may receive instructions from the 
user controlling selection of the synthesised speech to be 
output by the loudspeaker 6. This may be performed either 
by way of a tactile input and/or a voice command. For 
example, the user who did not hear a portion of the speech 
output by the loudspeaker 6 can instruct the device 2 to 
replay that portion, thereby improving the user's compre- 
hension. The tactile input device 5 may also be used to input 
text which may be broadcast by the loudspeaker 6 as 
synthesised speech. 

[0038] The electronic device may be any device which 
requires an audio interface. It may be a computer (e.g. 
personal computer PC), personal digital assistant (PDA), a 
radio communications device such as a mobile radio tele- 
phone e.g. a car phone or handheld phone, a computer 
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system, a document reader such as a web browser, a text TV, 
a fax, a document browser for reading books, e-mails or 
other documents of the like. 

[0039] Although the input device 4 and loudspeaker 6 in 
FIG. 1 are shown as being integrated in a single unit they 
may be separate, as may be microphone 3 and text input 
device 5 of the input device 4. 

[0040] FIG. 2 is a schematic illustration of the electronic 
device 2. The device 2, in addition to having the input device 
4 and the loudspeaker 6 has a processor 12 which is 
responsive to user input commands 26 for driving the 
loudspeaker and for accessing a memory 10. The memory 10 
stores text data 24 supplied via an input 4. The processor 12 
is illustrated as two functional blocks - a controller 14 and 
a text-to-speech engine 16. The controller 14 and text-to- 
speech engine 16 may be implemented as software running 
on the processor 12. 

[0041] The text-to-speech engine 16 drives the loud- 
speaker 6. It receives the text input 18 from the controller 
and converts the text input to a synthetic speech output 22 
which is transduced by the loudspeaker 6 to soundwaves. 
The speech output may, for example, be a certain number of 
words at a time, one phrase at a time or one sentence at a 
time. 

[0042] The controller 14 reads the memory 10 and con- 
trols the text-to-speech engine 16. The controller having 
read text data from the memory provides it as an input 18 to 
the text-to-speech engine 17. 

[0043] The memory 10 stores text data which is read by 
the controller 14. The controller 14 uses the text data to 
produce the input 18 to the text-to-speech engine 17. Text 
data is stored in the memory 10 by the input device 30. The 
input device in this example includes a microphone 3, a key 
means 5 (such as a key, display touch screen, joystick etc.) 
or a radio transceiver for receiving text data in the form of 
SMS messages or e-mails. 

[0044] The controller 14 also navigates through the text 
data in response to instructions 26 received from the user via 
input 4, so that the loudspeaker outputs the desired speech. 
Navigation may, for example, be forwarded to skip text or 
backwards to replay text. The navigation is performed so 
that the text is broadcast by the loudspeaker 16 in logical 
units. This is achieved by the controller parsing text it 
accesses from the memory 10. Parsing involves using punc- 
tuation identifiers within the text to separate portions of the 
text into logical units. Examples of punctuation identifiers 
are those which indicate an end of sentence such as a full 
stop (period) exclamation mark, question mark, capital 
letter, consecutive spaces, comma and other identifiers 
which indicate a logical break within the sentence, such as 
the comma, colon, semi-colon or dash. Alternatively, it may 
involve a punctuation identifier which indicates an end of a 
group of a predetermined number of words. The portion of 
the text between identifiers sent one at a time to the TTS 
engine 16. The controller maintains the database to enable 
control of the navigation. Examples are shown in FIGS. 9a 
and b of the accompanying drawings. 

[0045] In FIG. 9a the controller parses the text into groups 
of five words. This is useful, for example, where the text 
contains minimal or no punctuation marks. In this case, the 
controller groups the words by recognising space characters 



within the text and counting them. This may, for example, be 
done by looking for ASCII for a space character. The 
database has an entry for each of the 18 words in the phrase. 
Each entry has two fields. The first field 91 records the count 
of spaces incrementing from one to five. The second field 92 
records which text group the word entry belongs to, based on 
the count in the first field 91, both storing a text group 
identifier which is different for each group of five words. 
Referring to FIG, 9a, there are four distinct text groups 
having group identifiers 1, 2, 3 and 4. Group 1 includes the 
words "Hello Fred, thank you for". Group 2 includes the 
words "your mail I look forward". Group 3 includes the 
words "to see you at two". Group 4 includes the words 
"o'clock on Thursday 1 '. 

[0046] In operation the controller 14 forwards group 1 to 
the ITS 18, next group 2, then group 3 and finally group 4. 
During this time the controller 14 keeps track of which 
group is successfully output as synthesised speech. It may do 
this by storing the number of the group identifier forwarded 
to the TTS 18. If the controller receives the user's instruc- 
tion, then the controller navigates through the text to a 
desired position and forwards the associated text group to 
the TTS engine 16. For example, if the TTS engine is 
outputting synthesised speech corresponding to group 3, and 
the user inputs the backwards instruction, then control signal 
26 causes the controller to navigate back through the text to 
the beginning of the last ID group to be output (or forwarded 
to the TTS), and re -sends that group to the TTS engine 16 
for conversion and output by the loudspeaker 6. For 
example, assuming group 3 is currently being output, then in 
response to a backwards control signal 26 from the input 4, 
the controller 18 navigates back through the text to the 
beginning of group 3, to the word "to", and forwards text 
group 3 to the TTS engine 16 again for output by the 
loudspeaker 6 as synthesised speech. Assuming no further 
instructions are received from the user, then the controller 14 
duly forwards the text group 4 to the TTS engine, once the 
group 3 text is output. The controller 14 may be arranged to 
move back two groups in response to a backward command. 
This may occur, for example, if an instruction is received 
when the beginning of a text group is being output, for 
example if the first and second words of a group are being 
output. So if the word "seeing" in group 3, for example, is 
being output when the controller receives the backward 
instruction 26, then the controller may navigate back to the 
beginning of group 2 and forward that group to the TTS for 
output. 

[0047] Alternatively, the text replayed may be determined 
by duration since the last group is sent to the TTS engine 
before receipt of the backward instruction, or by a specific 
user input, such as two signals being received within a 
predetermined period. These alternatives will be explained 
further below. 

[0048] Likewise, if a forward instruction is received, the 
controller 14 navigates through the text and forwards the 
next group to the TTS engine for speech output by the 
loudspeaker 6. For example, if group 2 is currently being 
output as synthesised speech and the user inputs a forward 
instruction, then control signal 26 causes the controller to 
navigate forward through the text to the beginning of the 
next group to be output, namely group 3 and sends that 
group to the TTS engine for conversion to synthesised 
speech for output by the loudspeaker. Thereby, the rest of the 
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group 2 text not already output by the loudspeaker is 
skipped. Alternatively, if the end of group 2 is being output 
(for example the words "look" or "forward") when a for- 
ward instruction is received, then the controller may skip the 
third group and forward the fourth group to the ITS engine 
for conversion to speech for output by the loudspeaker 6. 

[0049] FIG. 3 illustrates a radio handset according to an 
embodiment of the present invention. The handset, which is 
generally designated 30, comprises the user interface having 
a keypad 32, a display 33, a power key 34, a speaker 35, and 
a microphone 36. The handset 30 according to this embodi- 
ment is adapted for communication via a wireless telecom- 
munication network, e.g. a cellular network. However, a 
handset could alternatively be designed for a cordless net- 
work. The keypad 32 has a first group of keys 37 which are 
alphanumeric keys and by means of which the user can input 
data. For example, the user can enter a telephone number, 
write a text message (e.g. SMS), write a name (associated 
with a phone number), etc. using these keys 37. Each of the 
12 alphanumeric keys 37 is provided with a figure "0" to "9" 
or "#" or respectively. In alpha mode, each key is 
associated with one or more letters and special signs used in 
text editing. The keypad 32 additionally comprises two soft 
keys 38a and 386, two call handling keys 39, and a scroll key 
31. 

[0050] The two soft keys 8 have functionality correspond- 
ing to what is known from a number of handsets, such as the 
Nokia 2110™, Nokia 6110™ and Nokia 8110™. The func- 
tionality of the soft key depends on the state of the handset 
and the navigation in the menu by using the scroll key, for 
example. The present functionality of the soft key 38a and 
386 is shown in separate fields in the display 33 just above 
the keys 38. 

[0051] The two call handling keys 39 may used for 
establishing a call or a conference call, terminating a call or 
rejecting an incoming call. 

[0052] The scroll key 31 in this embodiment is a key for 
scrolling up and down the menu. However other keys may 
be used instead of this scroll key and / or the soft keys, such 
as a roller device or the like. 

[0053] FIG. 4 is a block diagram of part of the handset of 
FIG. 3 which facilitates understanding of the present inven- 
tion. As is conventional in a radio handset, it comprises 
speech circuitry in the form of user interface devices (micro- 
phone 36 and speaker 35), an audio part 44, transceiver 49, 
and a controller 48. The microphone 36 converts speech 
audio signals into corresponding analogue signals which in 
turn are converted from analogue to digital by an A/D 
converter (not shown). The audio part 44 then encodes the 
signal and, under control of the controller 48, forwards the 
encoded signal to the transceiver 49 for output to the 
communication network. 

[0054] In the reverse situation, an encoded speech signal 
which is received by a transceiver 49 is decoded by the audio 
part again under control of the controller 48. This time the 
decoded digital signal is converted into an analogue one by 
a D/A converter (not shown), and output by speaker 35. 

[0055] The controller 48 also forms an interface with 
peripheral units, such as memory 47 having a RAM memory 
47a and a flash ROM memory 476, a SIM card 46, a display 
33 and a keypad 32 (as well as data, power supply, etc). 



[0056] In this embodiment, the audio part 44 also com- 
prises a TTS engine which, together with the controller 48, 
form a processor, as in the FIG. 1 embodiment. The device 
30 handles text speech synthesis in much the same way as 
described in connection with the corresponding parts in 
FIG. 2. 

[0057] Text may be input by the user via the keyboard 32 
and / or microphone 36 or by way of receipt from the 
communications network by the transceiver 49. The text 
data received is stored in memory (RAM 47a). The control- 
ler reads the memory and controls the TTS engine accord- 
ingly. The controller also navigates through the text in 
response to instructions received from the user via one or 
more of the microphone 36, keyboard 32 and navigation and 
selection keys 45, so that the speaker 35 outputs the desired 
speech in logical units. 

[0058] In this embodiment, as well as output ting text or 
speech, the handset also presents text on the display 33. 
Consequently the processor is responsible for controlling the 
display driver to drive the display to present the appropriate 
text. When it reads the memory 47a and controls the TTS 
engine, the controller 14 also controls the display. Having 
read text data from the memory, in this embodiment, the 
controller provides it as an input to the TTS engine and 
controls the display driver to display the text data used in 
control signals 431. The displayed text corresponds to the 
text converted by the TTS engine. This is also the case when 
a navigation instruction is received from the user. The 
database used for controlling navigation is used for the 
purpose of text output in general, and when the display text 
is desired the database is used in the control of the display 
simultaneously with the control of the TTS engine. In other 
words, in the FIG. 9a database, for example, when the 
controller sends a text group to the TTS engine, that text 
group is also sent to the display driver for presentation on the 
display. 

[0059] A handset such as that in FIG. 3 would generally 
have a range of menu functions. The Nokia 6110, for 
example, can have the following menu functions: 

[0060] 1. Messages 

[0061] 2. Call Register 

[0062] 3. Profiles 

[0063] 4. Settings 

[0064] 5. Call divert 

[0065] 6. Games 

[0066] 7. Calculator 

[0067] 8. Calendar. 

[0068] To access the menus, the user can scroll through the 
functions using the navigation and selection key 45 or using 
appropriate pre-defined short cuts. In general, the left hand 
scroll key 38a will enable the user to navigate through sub 
menus and select options, whereas the right hand soft key 
386 will enable the user to go back up the menu hierarchy. 
The scroll key 31 can be used to navigate through the 
options list in a particular menu/sub-menu prior to selection 
using the left hand scroll key 38a. 

[0069] The messages menu may include functions relating 
to text messages (such as SMS), voice messages, fax and 
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data calls, as well as service commands from the networks 
information service messages. A typical function list may 
be: 

[0070] 1-1 Inbox 

[0071] 1-2 Outbox 

[0072] 1-3 Write Messages 

[0073] 1-4 Message Settings 

[0074] 1-5 Info Service 

[0075] 1-6 Fax or Data Call 

[0076] 1-7 Service Command Editor. 

[0077] In the present invention, the handset has a setting 
for text speech synthesis. This setting may be pre-defined or 
be a profile to be selected by the user. If the setting is "On", 
then the Inbox message function may comprise options for 
the user to listen to a received text message etc. FIG. 5a 
illustrates how a user may select a message stored in the 
message inbox and listen to it, whilst FIG. 5b illustrates how 
to navigate through the message. 

[0078] In this embodiment, the menu options are dis- 
played one at a time. The messages menu is the first option 
and is presented on the display (stage 501). The user can 
select this option by pressing the left scroll key 38a asso- 
ciated with the "select" function displayed. Alternatively, if 
this option is not desired, the user can use the right hand 
scroll key to go back to the main menu, or the scroll key to 
scroll to an alternative option for selection, such as Call 
Settings. 

[0079] If the Messages option is selected, the first option 
in the first sub-menu is displayed, namely Inbox (stage 502). 
If the user selects this option by pressing the left scroll key 
38a, in this embodiment, the last three text messages are 
displayed, with the last received message being presented 
first in an options list (stage 503). This last received message 
is the default option which is selected if the left hand soft key 
38a is pressed. This default option may be indicated by 
being highlighted on the display. If the user wishes to read 
one of the other messages, he can navigate to them using the 
scroll key. Once a message has been selected, the user is 
given the choice of listening or reading the chosen message. 
(The listen option may be listen only or listen and read 
depending on the handset configuration). "Listen" is the 
default option. This may be chosen by pressing the left hand 
soft key 38a or the alpha key "1". Alternatively, in a 
preferred embodiment, the listen option may be automati- 
cally selected in the absence of user input after a certain 
period, for example two seconds. In the embodiment of FIG. 
5a, the handset is configured to play and display the selected 
message if the "Listen" option is selected (stage 505). 

[0080] A number of further options are available in respect 
of the selected message depending upon the state of the 
handset. 

[0081] If the listen option is selected as in stage 504, then 
during play of the message, the available options are forward 
and backward navigation options as described further with 
respect to FIG. 5b. Once the message has finished playing 
for a predetermined period without further user input, the 
options change to conventional text message options such as 
erase, reply, edit, use number forward, print via 1R details 
etc. (stage 506). 



[0082] If the read option is selected, then the same options 
are available irrespective of whether the whole message is 
presented on the display for the user to read. 

[0083] Turning now to FIG. 5b, this illustrates receipt of 
an incoming message (rather than accessing one previously 
received as in stage 503 of FIG. 5a). 

[0084] When a message is received from the communica- 
tions network via the transceiver 49, the controller sends a 
control signal to the display driver for the display to present 
a menu option as shown in stage 507. If the user wishes to 
access a message whilst the handset is in this state, then the 
left soft key 38a is pressed. Depression of the right soft key, 
on the other hand, will exit this menu, and the stored 
messages can be viewed/listened to later via the stages 
shown in FIG. 5a. 

[0085] In the FIG. 5b embodiment, when the left soft key 
is pressed the received message is accessed. The user is then 
given a choice to listen or read the message (stage 508). In 
this particular embodiment, the handset is configured to only 
play the message if the listen option is selected (by pressing 
the left soft key or the alpha numeric key "1"), and conse- 
quently the navigation options available are presented on the 
display (stage 509). The navigation options available in this 
embodiment are backwards and forwards options, with the 
backward option being the default. The backwards option 
may be selected by pressing the left soft key or the alpha- 
numeric key "1", or alternatively automatically when there 
has been no user input for a predetermined period. The 
forward option, on the other hand, may be selected by 
scrolling down once using the scroll key and then selecting 
using the left hand soft key 38a, or more quickly by pressing 
alphanumeric key "2". If either option is selected, in this 
embodiment, then a choice of backwards/forwards steps is 
given (stage 510). 

[0086] In this case, jumps 1, 2 or 3 are available, and the 
desired jump may be selected using the appropriate alpha- 
numeric key or the left soft key, following the scroll key if 
appropriate. The jump by one position backwards or for- 
wards is the default, and may automatically selected if the 
user doesn't provide any input within a predetermined 
period. The numbers 1-3 represent the number of jumps 
between punctuation identifiers in the chosen direction, as 
for example is described above with reference to FIGS. 9a 
and 9b. 

[0087] As mentioned above, in the FIG. 5b embodiment 
the listen option is listen only and hence once the listen 
option is selected (stage 508), the backwards and forwards 
options are presented on the display (stage 509). In contrast, 
in the FIG. 5a embodiment, the listen option is listen and 
read (play and display) and hence once the listen option is 
selected, the message is displayed on the display (stage 505). 

[0088] In the FIG. 5a situation when the user selects the 
"listen" option, "options" can be selected using the left soft 
key 38a to present navigation options on the display (as in 
stage 509 of the FIG. 56 embodiment). Likewise, a choice 
from these options can be made in the same way as for the 
navigation option of the FIG. 5b embodiment (stage 509) 
and the number of steps, 1, 2 or 3, as in stage 510. 

[0089] Alternatively, when the message is being played, 
shortcut keys, alphanumeric keys 1 and 2, can be pressed to 
automatically select the desired navigation option. Once a 
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navigation option has been selected, the choice of number of 
backwards/forwards steps is presented to the user as in stage 
510 of the FIG. 5b embodiment. 

[0090] FIG. 6 illustrates navigation through the text and 
subsequent output of selective portions of the text. Accord- 
ing to this embodiment, the controller 48 determines 
whether the user has selected the message listening option 
(step 601). If this is the case, the controller 48 reads text data 
from the memory 47 and controls the TTS engine to play the 
stored message over the speaker 35 (step 602). Whilst the 
message is being played, the controller checks for any input 
commands from the user (step 604). If no command is 
detected, then the controller continues to forward the mes- 
sage to the TTS engine until the end of the message is 
reached (step 603) then playing is stopped. If, on the other 
hand, the controller detects the input of a command, it 
determines the type of command. In this embodiment, the 
controller firstly detects whether the command is a back- 
wards command. If it is, the controller then determines the 
position to move back to (step 606), moves to that position 
(step 607), and the TTS engine plays the message from that 
position (step 608). For example, the controller identifies a 
punctuation identifier, reads the message stored in memory 
from that identifier and forwards that part of the message to 
the input of the TTS engine for replay. 

[0091] If the command is not a backwards command, then 
the controller determines whether the command is a for- 
wards command (step 609). If so, then the controller deter- 
mines the position to move forward to (step 610), moves to 
that position (step 607) and the TTS engine plays the 
message from that position (step 608). For example, the 
controller identifies the punctuation identifier, jumps to the 
part of the message from that identifier in the memory and 
forwards it to the input of the TTS engine for speech output. 

[0092] FIG. 7 illustrates various methods of inputting a 
repeat command. The controller 48 determines whether the 
user has selected the message listening option (step 701). If 
this is the case, the controller 48 reads the text data from the 
memory 47 and controls the TTS engine to play the stored 
message over the speaker 35 (step 702). Whilst the message 
is being played, the controller checks whether a backwards 
input command has been received from the user (step 704). 
If no command is detected then the controller continues to 
forward the message to the TTS until the end of the message 
is reached (step 703). Then playing is stopped. 

[0093] If, on the other hand, the controller detects a 
backwards input command, it goes on to determine the point 
from which the message is to be replayed. Four alternatives 
are illustrated in the flow chart of FIG. 7. These are 
illustrated as a string of steps in this flow chart, but it will 
be appreciated that a handset may only implement any one, 
or any combination, of them. 

[0094] Firstly, the controller determines whether a dedi- 
cated key is pressed (step 705), If so, it goes on to determine 
how many key presses (N) the user has made (step 706) and 
determines the position of the N 111 punctuation identifier 
back. For example, if the user presses the dedicated key 
twice, then the controller determines the position of the 
second punctuation identifier in the backwards direction 
from the current position. 

[0095] Secondly, the controller detects whether a function 
key corresponding to an input command is pressed. If so, it 



determines how many backward steps are selected (S) (step 
711) and determines the position of the punctuation 
identifier back (step 712). For example, the controller may 
identify selection of certain number of steps (S) using the 
scroll key 31 and left soft key 38 as described with reference 
to stage 510 of FIG. 5(c) above. 

[0096] Thirdly, the controller may determine whether an 
alphanumeric key is pressed subsequent to a backwards 
command input (step 720) and if so determines the digit (D) 
associated with the key press (step 721) and determines the 
position of the 0 th punctuation identifier back (step 722). 

[0097] For example, the controller may detect pressing of 
the alpha numeric key "1" and determine the position of the 
previous punctuation identifier on that basis. 

[0098] Fourthly, the controller may determine whether a 
voice command is input (step 730), and if so the controller 
will determine how many backward steps (R) have been 
requested (731) and thus determine the position of the II th 
punctuation identifier back. This can be achieved using 
conventional voice recognition technology. 

[0099] Once the desired position has been determined, the 
controller moves back to that position (step 708) and the 
TTS engine plays the message from that position (step 709). 

[0100] FIG. 8 illustrates a method of repeating text 
according to a preferred embodiment of the present inven- 
tion. 

[0101] The controller 48 determines whether the user has 
selected the message listening option (step 801). If this is the 
case, the controller 48 reads the text data from the memory 
47 and controls the TTS engine to play the stored message 
(step 802). Whilst the message is being played, the controller 
checks for a backwards command from the user (step 804). 
If no command is detected then the controller continues to 
forward the message to the TTS until the end of the message 
is reached (step 603). Then playing is stopped. 

[0102] If, on the other hand, the controller detects a 
backwards command input it then goes on to determine 
whether a dedicated key is pressed (step 805). The controller 
is arranged to control playback from an earlier punctuation 
identifier if the first identifier back from the position at the 
time of the backward command is close to that position and 
the user inputs the further backward command within a 
certain time frame from the first command. This is achieved 
by the controller comparing the period between the present 
position and the position of the previous punctuation iden- 
tifier (step 805) in response to the detection of the pressing 
of the dedicated key (step 804), and then checking whether 
the key is pressed again within a certain period (e.g. two 
seconds from the previous key press) (step 809). If this is the 
case, then the controller moves to the position of the second 
punctuation identifier back from the current position (step 
810). Alternatively, if either the period between the present 
position and position of the previous punctuation identifier 
is not less than the threshold (step 806) or the key is not 
pressed again within the predetermined period from the first 
key press (step 810), the controller moves to the position of 
the previous punctuation identifier from the current position. 
In either case, the controller reads the message from the 
appropriate punctuation identifier from the memory and 
forwards the message from that point to the input of the TTS 
engine for output (step 808). 
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[0103] The present invention includes any novel feature or 
combination of features disclosed herein either explicitly or 
any generalisation thereof irrespective of whether or not it 
relates to the claimed invention or mitigates any or all of the 
problems addressed. 

[0104] In view of the foregoing description it will be 
evident to a person skilled in the art that various modifica- 
tions may be made within the scope of the invention. For 
example, whilst the examples show a mobile communica- 
tions environment, the invention is equally applicable to 
other environments. In short, the invention would apply to 
any text-to-speech service. One such case, is the invention's 
application running on a Telco Service-server connected to 
a PSTN and accessed using a phone such as a mobile phone. 
Speech synthesis could then be controlled using DTMF 
tones. 

What is claimed is: 

1. An electronic device comprising: 

a speech synthesizer including a loudspeaker, arranged to 
convert an input dependent upon punctuated text, to an 
audio output representative of a human vocally repro- 
ducing the text; 

a user input device for inputting instructions to navigate 
through text, between positions defined by punctuation 
identifiers of the text, to a desired position; and 

a controller arranged to control navigation to the desired 
position and provide the speech synthesizer with an 
input corresponding to a portion of the text from the 
desired position, in response to input navigation 
instructions. 

2. A device as claimed in claim 1, further comprising a 
display for displaying text. 

3. A device as claimed in claim 1 or 2, arranged to 
navigate backwards through the text. 

4. A device as claimed in claim 3, wherein the controller 
is arranged to navigate backwards to a position defined by a 
predetermined punctuation identifier in response to an input 
to the user input device. 

5. A device as claimed in claim 4, wherein the controller 
is arranged to navigate backwards to a position defined by 
the first punctuation identifier in the backwards sequence. 

6. A device as claimed in claim 4, wherein the controller 
is arranged to navigate backwards to a position defined by 
the second punctuation identifier in the backwards sequence. 

7. A device as claimed in any of claims 4 to 6, further 
comprising means for determining the length of text and/or 
length of time for audible reproduction of the text between 
the current position and the position defined by the first 
punctuation identifier in the backwards sequence and, if the 
length is below a threshold, the controller is arranged to 
navigate backwards to a position defined by a second 
punctuation identifier in the backwards sequence. 

8. A device as claimed in any of claims 3 to 7, wherein the 
controller controls the speech synthesizer to provide an 
audio output of the text between the current position and the 
position defined by the predetermined punctuation identifier 
at a slower speed than a default speed. 

9. A device as claimed in claim 8, when dependent upon 
claim 2, wherein the default speed is that of the display of 
text on the display. 



10. A device as claimed in claim 8, wherein the default 
speed is the default speed of the audio output of text by the 
speech synchroniser. 

11. A device as claimed in claim 1 or 2, arranged to 
navigate forwards through the text. 

12. A device as claimed in claim 11, wherein the control- 
ler is arranged to navigate forwards to a position defined by 
a predetermined punctuation identifier in response to an 
input to the user input device. 

13. A device as claimed in claim 12, wherein the control- 
ler is arranged to navigate forwards to a position defined by 
the first punctuation identifier in the forwards sequence. 

14. A device as claimed in claim 12, wherein the control- 
ler is arranged to navigate forwards to a position defined by 
the second punctuation identifier in the forwards sequence. 

15. A device as claimed in any of claims 12 to 14, further 
comprising means for determining the length of text and/or 
length of time for audible reproduction of the text between 
the current position and the position defined by the first 
punctuation identifier in the forwards sequence and, if the 
length is below a threshold, the controller is arranged to 
navigate forwards to a position defined by a second punc- 
tuation identifier in the forwards sequence. 

16. A device as claimed in any preceding claim, arranged 
to navigate forwards through the text in response to a first 
instruction and backwards through the text in response to a 
second instruction. 

17. A device as claimed in any preceding claim, wherein 
the user input device comprises a key means. 

18. A device as claimed in claim 17, wherein the key 
means is a dedicated navigation instruction key. 

19. A device as claimed in claim 18, wherein the control 
means is arranged to determine the number of key actua- 
tions, and determine the position of the punctuation identi- 
fier associated with that number of key presses. 

20. A device as claimed in claim 17, wherein the key 
means comprises a multifunction key, and the controller 
controls the functionality of the multifunction key. 

21. A device as claimed in claim 20, wherein one function 
of the multifunction key is selecting a navigation instruction. 

22. A device as claimed in claim 21, wherein the control 
means is arranged to determine the position of the punctua- 
tion identifier associated with the navigation instruction 
selected by the multifunction key, 

23. A device as claimed in claim 21 or 22, arranged to 
provide the user with a navigation instruction options menu 
and for the user to select from the menu using the multi- 
function key. 

24. A device as claimed in claim 21 or 22, arranged such 
that the user inputs the navigation instruction via the user 
input device. 

25. A device as claimed in any preceding claim, wherein 
the user input device comprises a voice recognition device. 

26. A device as claimed in claim 21 or 22, arranged such 
that the user inputs the navigation instruction by way of a 
voice command. 

27. A device as claimed in any of claims 20 to 26, wherein 
the instruction is a number, and the control means is 
arranged to determine the position of the punctuation iden- 
tifier associated with that number. 

28. A device as claimed in any preceding claim, wherein 
the punctuation identifiers are one or more selected from 
punctuation marks, capital letters, spaces, a header of a 
group of words. 
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29. A device as claimed in any preceding claim, wherein 
the electronic device is a document reader or a portable 
and/or hand-held communications device. 

30. A portable radio communications device comprising: 

a speech synthesizer including a loudspeaker, arranged to 
convert an input dependent upon punctuated text, to an 
audio output representative of a human vocally repro- 
ducing the text; 

a user input device for inputting instructions to navigate 
through text, between positions defined by punctuation 
identifiers of the text, to a desired position; and 

a controller arranged to control navigation to the desired 
position and provide the speech synthesizer with an 
input corresponding to a portion of the text from the 
desired position, in response to input navigation 
instructions. 

31. A device as claimed in claim 30, which is a hand-held 
device. 

32. A device as claimed in claim 30 or 31, comprising 
means for mounting in a vehicle. 

33. A document reader comprising: 

a speech synthesizer including a loudspeaker, arranged to 
convert an input dependent upon punctuated text, to an 
audio output representative of a human vocally repro- 
ducing the text; 

a user input device for inputting instructions to navigate 
through text, between positions defined by punctuation 
identifiers of the text, to a desired position; and 

a controller arranged to control navigation to the desired 
position and provide the speech synthesizer with an 
input corresponding to a portion of the text from the 
desired position, in response to input navigation 
instructions. 



34. A car having a device as claimed in any of claims 1 
to 32, or a document reader as claimed in claim 33. 

35. A car as claimed in claim 34, wherein the user input 
device comprises key means on the steering wheel. 

36. A device substantially hereinbefore described with 
reference to and/or as illustrated in the figures of the 
accompanying drawings. 

37. A method of navigating through text to a desired 
position for audio output by a speech synthesizer, the 
method comprising: 

detecting instructions input by a user to navigate through 
text, between positions defined by punctuation identi- 
fiers of the text, to a desired position; 

controlling navigation to the desired position; and 

providing the speech synthesizer with an input corre- 
sponding to a portion of the text from the desired 
position. 

38. A method for providing speech synthesis of a desired 
portion of text, the method comprising: 

determining a desired start position from a selection 
defined by punctuation identifiers, from an instruction 
input by a user; 

moving to the desired start position; 

outputting speech synthesized text from that position. 

39. A method of navigating through text to a desired 
position for audio output by a speech synthesizer substan- 
tially as hereinbefore described with reference to, and/or as 
illustrated in any one, or any combination of FIGS. 5 to 8 of 
the accompanying drawings. 

***** 
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